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ABSTRACT 

Iterative decoding was not originally introduced as the solution to 
an optimization problem rendering the analysis of its convergence 
very difficult. In this paper, we investigate the link between iterative 
decoding and classical optimization techniques. We first show that 
iterative decoding can be rephrased as two embedded minimization 
processes involving the Fermi-Dirac distance. Based on this new 
formulation, an hybrid proximal point algorithm is first derived with 
the additional advantage of decreasing a desired criterion. In a sec- 
ond part, an hybrid minimum entropy algorithm is proposed with 
improved performance compared to the classical iterative decoding. 
Even if this paper focus on iterative decoding for BICM, the results 
can be applied to the large class of turbo-like decoders. 
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1. INTRODUCTION 



Bit-Interleaved Coded Modulation (BICM) was first suggested by 
Zehavi in [ 1| to improve the Trellis Coded Modulation performance 
over Rayleigh-fading channels. In BICM, the diversity order is in- 
creased by using bit-interleavers instead of symbol interleavers. This 
improvement is achieved at the expense of a reduced minimum Eu- 
clidean distance leading to a degradation over non-fading Gaussian 
channels [ 1|. This drawback can be overcome by using iterative de- 
coding (BICM-ID) at the receiver [2|. BICM-ID is known to provide 
excellent performance for both Gaussian and fading channels. 
The iterative decoding scheme used in BICM-ID is very similar 
to serially concatenated turbo-decoders. Indeed, the serial turbo- 
decoder makes use of an exchange of information between compu- 
tationally efficient decoders for each of the component codes. In 
BICM-ID, the inner decoder is replaced by demapping which is less 
computationally demanding than a decoding step. Even if this pa- 
per focuses on iterative decoding for BICM, the results can be ap- 
plied to the large class of iterative decoders including serial or paral- 
lel concatenated turbo decoders as long as low-density parity-check 
(LDPC) decoders. Among the different attempts to provide an anal- 
ysis of iterative decoding, the EXIT chart analysis and density evolu- 
tion have permitted to make significant progress (3)111 but the results 
developed within this setting apply only in the case of large block 
length. Another tool of analysis is the connection of iterative de- 
coding to factor graphs |5| and belief propagation [6|. Convergence 
results for belief propagation exist but are limited to the case where 
the corresponding graph is a tree which does not include turbo code 
or LDPC. A link between iterative decoding and classical optimiza- 
tion algorithms has been made recently in (7J where the turbo de- 
coding is interpreted as a nonlinear block Gauss Seidel iteration. In 
parallel, a geometrical approach has been considered and provides 



an interesting interpretation in terms of projections. The particular 
case of BICM-decoding has been studied in |'8][9). In (lp |, the turbo- 
decoding is interpreted in a geometric setting as a dynamical system 
leading to new but incomplete results. 

In this paper we reformulate the iterative decoding as two embedded 
proximal point algorithms involving the Bregman divergence built 
on the Fermi-Dirac energy. We prove that each iteration of the de- 
coding decreases a certain criterion.We also propose an hybrid min- 
imum entropy algorithm with improved performance compared to 
the classical BICM. 



2. BICM-ID WITH SOFT DECISION FEEDBACK 

A conventional BICM system II II is built from a serial concate- 
nation of a convolutional encoder, a bit interleaver and an M-ary 
bits-to-symbol mapping (where M — 2 m ) as shown in fig. Q] 
The sequence of information bits b is first encoded by a convo- 
lutional encoder to produce the output encoded bit sequence c of 
length L c which is then scrambled by a bit interleaver (as op- 
posed to the channel symbols in the symbol-interleaved coded se- 
quence) operating on bit indexes. Let d denote the interleaved se- 
quence. Then, m consecutive bits of d are grouped as a channel 
symbol cU = (dk m +i, ...d( fc+1 ) m ). The complex transmitted signal 
Sfc = e(dfc) is then chosen from an M-ary constellation \t where e 
denotes the mapping scheme. For simplicity, we consider transmis- 
sion over the AWGN channel. The received signal reads: 



yfc 



s fc + n fc 1 < k < L c /m 



(1) 



where is a complex white Gaussian noise with independent in- 
phase and quadrature components having two-sided power spectral 
density a\ . 

Due to the presence of the random bit interleaver, the true maximum 
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Fig. 1. Transmission model 

likelihood decoding of BICM is too complicated to implement in 
practice. Figure [2] shows the block diagram of the receiver for a 
BICM-ID system with soft-decision feedback. 
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Fig. 2. Receiver for a BICM-ID with soft-decision feedback 



In the first iteration, the encoded bits are assumed equally 
likely. The demapping consists in evaluating a posteriori proba- 
bilities (APP) for the encoded bits without accounting for the code 
structure, namely: 

PAPp(d km+i = b) ~ E s:Sfee *- f>(y|s)p(s) (2) 
~ E Sfc e*> p(yfc|sfc)p(s fc ) (3) 

where s = {si, s Lc/m }, y = {yi, J%,/ m } and b £ 
{0, 1}, denotes the subset of ^ that contains all symbols whose la- 
bels have the value b in the i th position. In the turbo decoding pro- 
cess, the quantities exchanged through the blocks are not a posteriori 
probabilities (APP) but extrinsic information [121 . The extrinsic in- 
formation at the output of the demapping p(dk m +i', O) is computed 
as PAPp(dk m +i)/p(dk m +i; I) where p(d km+i ; I) is the a priori 
information for the demapping sub-block. Since the bit interleaver 
makes the bits independent, the extrinsic information p(dk m +i', O) 
reads: 

p(dk m +i = b; O) = K m ^2 p(yk\sk)Y\_P(dkrn+j; I) (4) 

and the corresponding APP reads: 
PApp(dk m +i = b) = K' m ^2 P(yk\sk)Y\_p{d km +j; I) (5) 

where K m and K' m are normalization factors. The extrinsic in- 
formation p(dkm+i', O) is de-interleaved and delivered to the SISO 
decoder 1131 as an a priori information on the encoded bits. Let 
ci = dcr-iffem+j) where a^ 1 is for the permutation on the indexes 
due to the deinterleaver; p(cr, I) is the updated input of the Single 
Input Single Output (SISO) decoder. The extrinsic information at 
the output of the SISO decoder is obtained through 1 8~l ll4l : 



p{ci 



;0)=K C J2 lc(c)Y[p( Cj ;I) 



and the corresponding APP is: 



Papp (ci = b) = K' c 1c (c) n p( Cj ; I) 



(6) 



(7) 



where Ic(c) stands for the indicator function of the code, i.e. 
Ic (c) = 1 if c is a codeword and otherwise and Tl\ denotes the 
set of binary words of length L c with value 6 in the I th position. 
K c and K' c are normalization factors. The extrinsic information 
p(ci;0) is interleaved and delivered to the demapping sub-block 
as a regenerated a priori information. If the process converges the 
APP of the two sub-blocks are the same. The criteria proposed 
in the following are based on this property and encourage a faster 
convergence towards this objective. 



3. NOTATIONS FROM INFORMATION GEOMETRY 
3.1. Basic tools 

We first introduce some notations that will be useful in the sequel. 
Let Bi £ {0, 1}^ denote the binary representation of the integer 



i,0 < i < 2' 



The binary representation of all the words of 



dimension 2 x JV. Let r/ be a probability mass function on the 

Pr[ X : 



outcomes \ — Bj then 

r? = (Pr[x = B ],Pr[ X = Bi 



>2N-lJ 



Given a PMF r\, its log-coordinates are the vector 9 whose i' el- 
ement is given by 8i — ln(Pr[x = B ; ]) — ln(Pr[x = Bo]). 
We can observe that there is a one-to-one mapping between r\ and 
9 since the vector r\ can be written r\ — exp(# — ip(6)) where 
ip(9) = ?o<?Q7J. exp((9)i)).We also introduce the bitwise log- 



probability ratios with elements of the form 



log ( 



JV[xj=iL 

Pr[ X j=0]' 



where %j ls the j th bit of the binary word x and A G R . For fac- 
torisable probability measures (ie PMF that factors into the product 
of their bitwise marginals so that Pr(x) = HjPr(xj))> the log- 
coordinates take the form 9 = BA. 



3.2. Link with iterative decoding 

Let 9 m denote the log-coordinates vector of the PMF p(y|s). 
Let Ai denote the log-probability ratio corresponding to the prior 
p{dkm+i;I) such that: 



(M)km+i — m 



p(d, 



= 1;I) 



P{d, 



km-\-i 



0;/) 



Thus, the log-coordinates of p(y\s)Hj^p{dk m +i\ I) reads 
BAi + 9 m . 

Let PBAi+9 m represent the vector whose i th element is the 
probability that the i th bit is 1 according to the measure 
with log-coordinate BAi + 9 m . The APP at the output 
of the demapper merge with PBXx+Bm- From eq. l[4j-l(5j, 

PAPp{dkm+i — b) — p(d k m+i = b-I)p(d k m+i = b;0), the 
log-coordinates of the APP at the output of the demapper also merge 

'p<.d km+i =l;0) y 



with B(Ai + A2) where (A 



2)km + 



i = In 



Then 



k P(dfem+i=0;O) 

the demapper sub-block solves, with respect to A2, the equation: 

Pb(Ai+a 2 ) — PBAi+e m (8) 

Let 9 C denote the log-coordinates of the PMF associated with the 
indicator function. Then the decoder sub-block solves, with respect 
to Ai, the equation: 

Pb(A!+a 2 ) = £>ba 2 +9 c (9) 
Iterative decoding is thus equivalent to: 

find\ 2 (k+1) such that P B(X (V +x (*+i) } = P BA « +f)m 
/indAi (fc+1) such that P B(A ^+i) +A ^+i) ) = P B A 2 k+1) 



+ 0c 



At the convergence, the APP from the two sub-blocks should be in 
accordance ie P B(A (~) +A <~) ) = P BA (~) H 



4. AN OPTIMIZATION PROBLEM 

The Fermi-Dirac divergence is the Bregman divergence built on the 
Fermi-Dirac entropy /(p) = £V p 3 lnfe) + (1 — pj) ln(l — pj) 
with dom(f) — [0; 1]. The Fermi-Dirac divergence reads 

D FD ( P , q) = y,ps ln (-) + Ed - ln (tzj. 



length N is gathered into matrix B = (Bo, Bi, 



>2«-l 



) with 



and is exactly the Kullback-Leibler distance for bit probabilities. 
The Fermi-Dirac divergence is a non-symmetric distance. As we 
can notice, this distance is very convenient for computing distances 
between bit probabilities. 



Proposition 1 The demapping sub-block solves the minimization 
problem min D F n(p B x 1+ e m ,PB(x 1+ x 2 )) 

X 2 

The decoding sub-block solves the minimization problem 
min D FD (pv\ 2 +e c ,PB(\ 1 +\ 2 )) 

Proof: The proof is obvious by noting that (Ai + \2)km+i — 

This proposition illustrates that iterative decoding can be formulated 
as two embedded minimization steps based on the Fermi-Dirac dis- 
tance. In the next section, we investigate some modifications of this 
original criterion. 

4.1. An hybrid proximal point algorithm 

In the classical iterative decoding, the two minimization steps seem 
independent meaning that the minimization of one of the criterion 
does not imply necessarily a decrease of the other criterion at the 
next iteration. Proximal point methods 1 15] permit to make the link 
between the two criteria. These methods are generally used to guar- 
antee the monotonicity of the convergence process often at the cost 
of a slow convergence speed. Following the proximal point tech- 
nique we obtain the minimization process: 

A 2 +1) = min Je,„(Ai, A 2 ) = min Dfd (PBAi+s m , Pb(Ai+a 2 )) 



+^ m D F D (P B (A^ k) +A^ k) )' PB(Ai + A 2 ) 



Ai k+1) = min Je c (Ai, A 2 ) = mmD FD (p B x 2 +e a ,PB(x 1 +x 2 )) 

Ai A x 

+HcD F D (P B(A (k) +A CM-1) ) 1 PB(Ai + A 2 ) ) 

As can be seen, the original criterion is modified through the addition 
of a penalization term in order to encourage smooth variations of 
the successive estimates. This minimization process is equivalent to 
finding A 2 k+1 ' such that 



P B(A< k »+A 2 k+1 >) 



and A^ k+1 ^ such that 



P BA< k > +9 m + ^ mP B(A< k >+A 2 k >) 
1 + Mm 



(10) 



where T>fd is a symmetric distance, namely 

Vfd = ^ FD ( p BA< k) +e m ' P B(A< k >+A 2 k) )) + 



D ™( P B(A< k )+x( k) )' P BA( k) +fl B 



The upper bound for ^ c 

can be obtained in the same way. Iterating JlOt and dl It with /j c and 
/_t m correctly chosen yields an algorithm that converges towards the 
same points than the classical iterative decoding with the additional 
advantage of decreasing at each iteration a desired criterion. In the 
next section, we propose a new criterion in order to improve the 
performance of the iterative decoding. 

4.2. An hybrid minimum entropy algorithm 

The entropy of the vectors of marginals Pb(Ai+a 2 ) is defined as 

Eb(\ 1+ x 2 ) = - ^2pB(M+\ 2 ){n)log2(p B(Xl+X2) (n)) 

n 

-J^t 1 ~ PB(x 1+ x 2 ){n))log2{\ - p B (x 1+ x 2 ){n)) 

n 

The quantity -Eg(A 1 +A 2 ) gives a measure of the reliability of the de- 
cisions. Indeed, -E_b(a 1 +a 2 ) does not always mean that the 
decisions are correct, but rather that the iterative decoding algorithm 
is confident about its decisions. Nevertheless, in the iterative decod- 
ing, the decisions are in most cases correct when -Eb(A!+a 2 ) — ► 
1161 . In this section, we propose a new criterion that minimizes 
S S (a 1 +a 2 ) under the constraint Dfd (pBA!+e m , Pb(A!+a 2 )) < e 
for the demapping and D FD (pba 2 +s c , Pb(A!+a 2 )) for the decod- 
ing. This is equivalent to: 

A 2 k+1) = mmDpz,(p B Ai+e m ,PB(Ai+A 2 )) + nmE B (\ 1 +x 2 ) ( 12 ) 

A 2 

A^ k+1) = minDFD(pBA 2 +e c ,PB(A 1 +A 2 )) + '?c-E B(Al+A2) (13) 

Al 

By zeroing the gradient of the two criteria in d 1 2b and J 1 3b . we obtain 
the new update equations: 

A 2 (fe+1) : k(p Ba (i) 41 (Hi),(n)) =-PsA< fc >+e, r ( n ) 1 - n - Lc 



B(A\"'+A 2 " T1 V 



Ai (fc+1) : /%(P fl(A w l ) +i (H 1 ) ) W)=P BA (»+ 1)+6c (n) 1 < n < L c 

where /„(p S (A 1+ A 2 )(n)) = p B (x 1+ x 2 )(n) - VPb(x 1+ x 2 ) (n)(l - 
Ps(A 1+ A 2 )(n))log { i-pz^l^n) )- The function fv is P lott ed 



P BA 2 k+1 '+S c + ^ cP B(A^ k '+A 2 k + 1 ») 

P , (k+1) (k+1) — — (11) 

B{X ± +A 2 ) 1 + fl c 



Note that this new procedure also converges towards solu- 
tions satisfying <[8j and j9j, A good choice for the param- 
eters fim and fi c permits to ensure that each criterion de- 
creases with the iterations. Actually, we want to enforce 

^e m (^i k \-4 k+1) ) < ^c(^i k) J ^ 2 k) )- Since the Fermi-Dirac dis- 
tance is convex with respect to its second parameter, we have 

Je m (A 1 k ,A 2 k+1 ) < T+5t:( I) fo(p BA ( k ) +em ,p B{A (k) +A ( k ) ) ) + 

D FD{p B(x (k) +x (k)yP BX (k) +gm ))- Moreover, we also have 



L)i?D (PRx( k )-Lfl-'PB(Af'+A 2 k »)) - J 9c(Ai ,A 2 J ). Connecting 



the two relations, we obtain an upper bound for [i m : 

DFD ( P BA< k » +B C ■■ P B(A< k) + A< k > ) ) 



lira < 



V FD - D FD (p {k) ,P R ,.(k) , ,<k 




BiXf'+Xf')' 



Fig. 3. f v (p) for various values of 77 

on fig {3). We can notice that: (i) the distortion increases with 77 (ii) 
f v (p) belongs to [0; 1] (Hi) f v (p) is a strictly increasing function. As 
a consequence each step of the minimization process has a unique 
solution that can be found using classical techniques. 



5. SIMULATION 



7. REFERENCES 



We compare the performance in terms of bit error rate and iteration 
number of the classical iterative decoding with the hybrid proximal 
point algorithm (HPP) and also with the hybrid minimum entropy al- 
gorithm (HMEA). Each algorithm stops when the Fermi-Dirac dis- 
tance between the APP of the two sub-blocks is less than 10 or 
when 30 iterations are reached. The generator polynomial of the 
encoder is g — [111; 001; 100]. The bits are mapped using subset 
partitioning to a 8-PSK modulation. The length of the coded bit se- 
quence is L c = 6000. The step-sizes r\ m and r\ c in the HMEA are 
both chosen equal to 0.05. The results are plotted in fig. Q and l[5}. 
We can see that the classical iterative decoding and the HPP exhibits 
exactly the same performance. This is not surprising concerning the 
BER since both methods converge towards the same points. We can 
also notice that these results are obtained with the same number of 
iterations in both cases meaning that the proximal point technique 
does not reduce, in this case, the convergence speed. Both methods 
have almost the same computational complexity with the additional 
advantage for the HPP to minimize a desired criterion with the it- 
erations. As expected, the HMEA outperforms the others methods 
in terms of BER in the middle area with a number of iterations at 
most equal to the number of iterations needed in the classical BICM. 
However, this last method has a higher computational complexity 
due to the distortion function /,,. 
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Fig. 5. Iteration number versus EbNO 



6. CONCLUSION 

In this paper, iterative decoding is rephrased as two embedded min- 
imization processes. From this formulation, we have derived an hy- 
brid proximal point algorithm that exhibits the same performance 
than the classical iterative decoding. This proximal point algorithm 
decreases at each step a well identified criterion. We have also built 
an hybrid minimum entropy algorithm. The minimization of the en- 
tropy leads to an improvement of the performance. 
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